Collaborative Filtering and Heavy-Tailed Degree Distributions
نویسندگان
چکیده
Common techniques in collaborative filtering rely on finding low-rank matrix approximations to the adjacency matrix (ratings that users assign to items), essentially representing users and items as a collection of a small number of latent features. One issue that arises in many real world datasets for collaborative filtering is that the number of observed entries per row/column follows a heavy-tail distribution. For instance, in the Amazon product ratings dataset, the maximum degree of a product is 12180 whereas the average number of ratings for each product is 4.68. We show that these over-represented rows/columns alter the spectrum of the adjacency matrix significantly, which negatively affects the performance of SVD based methods for low-rank approximations. Further, we present experimental evaluation to show that discarding rows/columns with high degree results in improved performance accross several different datasets (Amazon products ratings, movie ratings and book ratings).
منابع مشابه
A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملنمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر
Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...
متن کاملNotes : Social Networks : Models , Algorithms , and Applications
We specifically discussed the power law degree distribution which has degree distribution pk = C ·k−α where C is a constant and α > 1. While not all degree distributions will be power law, many of the degree distributions of observed networks will be heavy-tailed or long-tailed distributions. A heavy-tailed distribution is a distribution that is “heavier” than the exponential distribution. Here...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملModeling and Analysis of Heavy-tailed Distributions via Classical Teletraac Methods
We propose a new methodology for modeling and analyzing heavy-tailed distributions, such as the Pareto distribution, in communication networks. The basis of our approach is a tting algorithm which approximates a heavy-tailed distribution by a hyperexponential distribution. This algorithm possesses several key properties. First, the approximation can be achieved within any desired degree of accu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013